COSC 388: Machine Learning

Project 3
Fall 2006

Due: Nov 10 @ 5 P.M.
13 points

Implement ID3, the forerunner of C4.5 (i.e., symbolic attributes, information gain for attribute selection, and no pruning). Conduct an evaluation of ID3, naive Bayes, and k-NN using the 1984 Congressional Voting Record: votes.names. The training and testing files are in the archive votes.zip. Compute average measures of performance and measures of dispersion.

The implementation of ID3 must be general, meaning that it must work with any data set with symbolic attributes.

I must be able to run your programs on other data sets. You can assume that the data sets will be constructed and named in a manner similar to the pima and votes data sets. Make sure I can run your implementations on such data sets from the command line.

Include in the archive everything needed to compile, run the programs, and reproduce the results.

In a text file named README, include the results of the evaluation and instructions about how to execute your program and reproduce the results.

Instructions for Submission: In the header comments, provide the following information:

//
// Name
// E-mail Address
// Platform: Windows, OS X, Linux, Solaris (daruma), etc.
// Language/Environment: gcc, g++, java, g77, etc.
//
// In accordance with the class policies and Georgetown's Honor Code,
// I certify that, with the exceptions of the class resources and those
// items noted below, I have neither given nor received any assistance
// on this project.
//
When you are ready to submit your program for grading, create a compressed archive of a directory containing only your project's source, and send it to me by e-mail as an attachment. The directory's name should be the same as your net ID.

For example, assume your net ID is ab123. If the directory p1 contains your project, then rename the directory to ab123.

To make the archive smaller, remove any object files, such as .class, a.out, and .o files.

Use zip, tar, or jar to create an archive:

% zip ab123.tar ab123/*
% tar -cf ab123.tar ab123
% jar -cf ab123.jar ab123
Use jar only for Java projects. If you use jar or tar, then compress the archive by typing
% gzip ab123.tar
% gzip ab123.jar
which creates a file ab123.tar.gz and ab123.jar.gz, respectively.

N.B. If you use zip, then you need to change the extension of your file to something other than .zip, as UIS strips .zip attachments. The extension .piz works pretty well. So you'd rename ab123.zip to ab123.piz.

Attach the file containing your project to an e-mail and send it to me.

Make sure you send a carbon copy of your project to yourself, so you'll have a record of when you submitted your project. Ideally, also keep a copy on a university or department machine. However, make sure that your archive, directory, or files are not readable by others.

Submit your project before 5:00 P.M. on the due date.